Incident Report: Frequent Dead RPC URL Alerts for Kava's Public RPC
Date: 2024-01-02
Time: 12:34 (GMT+3)
Duration: 23 minutes
Description
Frequent alerts were reported for a Dead RPC URL detected for the Kava's public RPC. The alerts kept opening and auto-closing within 15 minutes over a half-day period, and also reached the maximum count limit during the open time.
Root Cause
The root cause was identified as an issue with Kava's public RPC URL. The alerts were triggered due to the RPC not being functional intermittently over the reported period.
Impact
The frequent alerts could have caused monitoring fatigue or confusion, although there was no immediate impact on operations since Reblok, the primary RPC, was functional and the public RPC is only a backup/fallback.
Timeline
- 12:34 - Andrew reported the issue with frequent Dead RPC URL alerts.
- 12:57 - Aaron acknowledged the problem and suggested considering a swap for a more reliable public RPC.
Lessons Learned
Even backup or fallback systems can cause alert fatigue and need monitoring for reliability. Regular checks and updates to these systems can prevent unnecessary alerts and ensure they are ready to function effectively when needed.
Actions Taken
- Monitoring and reporting of the frequent Dead RPC URL alerts for Kava's public RPC.
- Confirmation that primary operations were not affected due to Reblok functionality.
- Discussion and consideration of swapping out the unreliable public RPC for a more stable one.
Related Images/Logs
- Escalation link.
Incident Reviewer(s)
- Andrew Prasaath (Reported and followed up on the issue)
- Bedirhan (Provided initial assessment)
- Aaron (Acknowledged and suggested a potential solution)